Two Easy Improvements to Lexical Weighting

نویسندگان

David Chiang

Steve DeNeefe

Michael Pust

چکیده

We introduce two simple improvements to the lexical weighting features of Koehn, Och, and Marcu (2003) for machine translation: one which smooths the probability of translating word f to word e by simplifying English morphology, and one which conditions it on the kind of training data that f and e co-occurred in. These new variations lead to improvements of up to +0.8 BLEU, with an average improvement of +0.6 BLEU across two language pairs, two genres, and two translation systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The two be's of English

This qualitative study investigates the uses of be in Contemporary English. Based on this study, one easy claim and one more difficult claim are proposed. The easy claim is that the traditional distinction between be as a lexical verb and be as an auxiliary is faulty. In particular, 'copular-be', traditionally considered to be a lexical verb, is in fact a prototypi...

متن کامل

Semantic Feature Analysis Treatment for Anomia of Two Nonfluent Persian-Speaking Aphasic Patients

Objectives: Semantic Feature Analysis was designed to improve lexical retrieval of aphasic patients via activation of semantic networks of the words. In this approach, the anomic patients are cured with semantic information to assist oral naming. The purpose of this study was to examine the effects of Semantic Feature Analysis treatment on anomia of two nonfluent aphasic patients. Methods: A...

متن کامل

Combining lexical and statistical translation evidence for cross-language information retrieval

This paper explores how best to use lexical and statistical translation evidence together for CrossLanguage Information Retrieval (CLIR). Lexical translation evidence is assembled from Wikipedia and from a large machine readable dictionary, statistical translation evidence is drawn from parallel corpora, and evidence from co-occurrence in the document language provides a basis for limiting the ...

متن کامل

Workshop Notes of the ECML / MLnet Workshop on Empirical Learning of Natural Language Processing Tasks

This paper analyses the relation between the use of similarity in Memory-Based Learning and the notion of backed-oo smoothing in statistical language modeling. We show that the two approaches are closely related, and we argue that feature weighting methods in the Memory-Based paradigm can ooer the advantage of automatically specifying a suitable domain-speciic hierarchy between most speciic and...

متن کامل

Topic Models for Dynamic Translation Model Adaptation

We propose an approach that biases machine translation systems toward relevant translations based on topic-specific contexts, where topics are induced in an unsupervised way using topic models; this can be thought of as inducing subcorpora for adaptation without any human annotation. We use these topic distributions to compute topic-dependent lexical weighting probabilities and directly incorpo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Two Easy Improvements to Lexical Weighting

نویسندگان

چکیده

منابع مشابه

The two be's of English

Semantic Feature Analysis Treatment for Anomia of Two Nonfluent Persian-Speaking Aphasic Patients

Combining lexical and statistical translation evidence for cross-language information retrieval

Workshop Notes of the ECML / MLnet Workshop on Empirical Learning of Natural Language Processing Tasks

Topic Models for Dynamic Translation Model Adaptation

عنوان ژورنال:

اشتراک گذاری